A Probabilistic Approach to Persian Ezafe Recognition

نویسندگان

  • Habibollah Asghari
  • Jalal Maleki
  • Heshaam Faili
چکیده

In this paper, we investigate the problem of Ezafe recognition in Persian language. Ezafe is an unstressed vowel that is usually not written, but is intelligently recognized and pronounced by human. Ezafe marker can be placed into noun phrases, adjective phrases and some prepositional phrases linking the head and modifiers. Ezafe recognition in Persian is indeed a homograph disambiguation problem, which is a useful task for some language applications in Persian like TTS. In this paper, Part of Speech tags augmented by Ezafe marker (POSE) have been used to train a probabilistic model for Ezafe recognition. In order to build this model, a ten million word tagged corpus was used for training the system. For building the probabilistic model, three different approaches were used; Maximum Entropy POSE tagger, Conditional Random Fields (CRF) POSE tagger and also a statistical machine translation approach based on parallel corpus. It is shown that comparing to previous works, the use of CRF POSE tagger can achieve outstanding results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Selection of Reference Pages in Wikipedia for Improving Targeted Entities Disambiguation

A 59 A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization Marc Franco-Salvador, Paolo Rosso and Roberto Navigli A 10170 A Probabilistic Approach to Persian Ezafe Recognition Habibollah Asghari, Heshaam Faili and Jalal Maleki A 10137 Acquiring a Dictionary of Emotion-Provoking Events Hoa Trong Vu, Graham Neubig, Sakriani Sakti, Tomoki Toda and Satoshi Nakamur...

متن کامل

A Hybrid Algorithm for Recognizing the Position of Ezafe Constructions in Persian Texts

17- Abtract — In the Persian language, an Ezafe construction is a linking element which joins the head of a phrase to its modifiers. The Ezafe in its simplest form is pronounced as –e, but generally not indicated in writing. Determining the position of an Ezafe is advantageous for disambiguating the boundary of the syntactic phrases which is a fundamental task in most natural language processi...

متن کامل

Persian Ezafe as a 'figure' Marker: a Unified Analysis

This article is a conceptual exploration of Ezafe in Modern Persian. I will consider cases where Ezafe seems to be conceptually non-neutral. In certain cases of the ‘X-e Y’ construction, X and Y can change their places with a shift in meaning while they are apparently frozen in their positions in other cases of Ezafe construction. The question to address here is if the Ezafe element -e marks an...

متن کامل

On the Importance of Ezafe Construction in Persian Parsing

Ezafe construction is an idiosyncratic phenomenon in the Persian language. It is a good indicator for phrase boundaries and dependency relations but mostly does not appear in the text. In this paper, we show that adding information about Ezafe construction can give 4.6% relative improvement in dependency parsing and 9% relative improvement in shallow parsing. For evaluation purposes, Ezafe tags...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014